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Morrie Gasser of the MITRE Corporation has written a set of 
prograas that are capable of generating pronouncable English 
words at random. Enclosed with this MTB is the draft 
documentation for the various modules which comprise the word 
generator. Comments on the user interface are especially 
welcome; send them to Green. HOruid and Gasser. AOruid on the MIT 
Multics system. 

The random word generator <random_word_) is a table-driven 
program that returns an array of numbers (units) which form a 
word. The units are supplied by a subroutine that is 
caller-specified. The standard version of this subroutine is 
named random_unit_* although there is no requirement that the 
units themselves be random. 

The parameters to random_word_ are the number of letters 
that may appear in the generated word* and the random__unl t_ 
subroutine. The random_word__ routine calls random_unit_ 
repeatedly to get units* each time determining from a "digram 
table" whether the returned unit may be added to the end of the 
word being generated* according to the rules encoded in the 
digram table. Units which satisfy the rules are added to the end 
of the generated word* units which do not satisfy the rules are 
ignored. Units are requested until the length in letters meets 
the caller's criteria. 



The table that drives random_word_ is referenced as an 
external array with the name "digrams.*". This table can be 
prepared by the user by creating an ASCII segment specifying the 
rules* and compiling it with the digram_tabl e_compi I er • The 
digram table is in two parts. The first part specifies one or 
two letter symbols that define each unit* and some flags that 
define various rules for each unit. The second part lists every 
possible pair of these units <i.e.» if there are n units then 
there are n*n pairs)* and contains several more flags for each 
pair that define rules about combining pairs. 



Multics Project internal working documentation. Not to be 
reproduced or distributed outside the Multics Project. 



MTB-19«* 



Multlcs Technical Bulletin 



Only the digram table itself is specifically English-oriented? 
the symbolic representation of the units and letters is 
unimportant to the digram - .table_compl ler and random_word_ (except 
that the number of letters in each unit is used to determine how 
long the generated word is). The random_word_ and random_uni t_ 
subroutine operate upon unit indices* not the actual ASCII 
characters. These unit indices may be converted back to their 
character represenat Ions by calling the convert_word_ subroutine. 

As the word generator currently exists* the randora_uni t_ 
subroutine "knows* - what units exist in the digram table* what 
their frequencies of occurance are* and which ones have specific 
attributes. Thus It does not have to reference the digram table. 
For that reason* if it desired to replace the digram table* the 
randora_unlt_ subroutine must also be replaced. Some of these 
dependencies could have been eliminated by having the 
random_unit_ subroutine reference the digram table on the first 
call to determine which units exist* but this was not done for 
reasons of efficiency. The only unit attribute that random_uni t_ 
cares about is the '•vowel'" attribute* for the entrypoint 
random_uni t_$random_vowel • For these reasons* a new digram table 
can be created (without replacing random_uni t_) only if the 
English-letter representation of the units* and the order of the 
units* is not modified. 

Note that only the command Interface ( generate_words) will 
be user-visible* the rest of the modules will remain Internal 
interfaces. 
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MafflS* generate_word_ 

This subroutine returns a random pronounceable word as an 
ASCII character string. It also returns the same word split by 
hyphens into syllables as an aid to pronunciation. 

Usage 

declare generate_wor d_ entry <char(»), char{*), fixed bin* 
fixed bin)? 

call genera te_word„ {word, hyphenated_word, min, max)? 

i> word is the random word, padded on the right with 

blanks* This string roust be long enough to 
hold the word (at least as long as max). 
(Output) 

2) hyphenated_word is the same word split into syllables. The 

length of this string must be greater than 
max to allow for the hyphens. A length of 
3*«ax/2 ■*• i will always be sufficient. 
(Output) 

3) min is the minimum length of the word to be 

generated. This value must be greater than 3 
and less than 21. (Input) 

«*) max is the maximum length of the word to be 

generated. The actual length of the word 
will be uniformly random between min and max. 
The value of max must be greater then or 
equal to min* and less than 21. (Input) 

Each call to generate_word_ should produce a different 
random word, regardless of when the call is made. However, as 
with any random generator, there is no guarantee that there will 
be no duplicates. The probability of duplication Is greater with 
shorter words. 
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En trv t generate_wor d_$lni t_seed 

This entry allows the user to specify a starting seed for 
generating random words. If a seed Is specified, the exact same 
sequence of random words will always be generated on subsequent 
calls to generate_word_ providing the same values of min and max 
are specified. If this entry is not called in a process, the 
value of the clock is used as the initial seed on the first call 
to generate_word_» thereby "guaranteeing" different sequences of 
words in different processes. 

declare generate_wor d_$inl t_seed entry (fixed bln(35))? 

call generate_word_$ini t_seed (seed); 

1) seed is the initial seed value* If zero, the system 

clock will be used as the seed. (Input) 
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Name t generate_words* gw 

This command Mill print random pronounceable "words - on the 
user's terminal. 



Usage 

generate_words -cont rol_args- 
1) control_args may be selected from the following! 
nwords 



in n 



-max q 



-length n* Hn q 



-hyphenate* -hph 



-seed SEED 



is the number of words to print, 
specified* one word is printed. 



If not 



specifies the minimum length, 
of the words to be generated. 



in characters* 



specifies the maximum length of the 
be generated. 



words to 



specifies the length of the words to be 
generated. If this argument is specified* all 
words will be this length, and -min or -max 
may not be specified. 



causes the hyphenated form 
syllables) of each word 
alongside the original word. 



(divided into 
to be printed 



On the first call to generate_words in a 
process, the system clock is used to obtain a 
starting "seed** for generating random words. 
This seed is updated for every word generated* 
and subsequent values of the seed depend on 
previous values (in a rather complex way). If 
the -seed argument is specified* SEED must be 
a positive decimal integer. For a given value 
of SEEO* the sequence of random words will 
always be the same providing the same length 
values are specified. When no -seed argument 
is specified* the last value of the updated 
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seed from the previous call to generate__words 
will be used* To revert back to using the 
system clock as the seed* specify a zero value 
for SEEOt l*e** -seed 0« 

Notes 

If neither -min* -max* nor -length are specified* the 
defaults are -min 6 and -max 8* In all other cases* the defaults 
are -min <* and -max 20* 

If -length is not specified* the lengths of the random words 
will be uniformly distributed between min and max* Words 
generated are printed one per line* with the hyphenated forms* If 
specified* lined up in a column alongside the original words* 
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Name i convert_word_ 

This subroutine Is used to convert the random word array 
returned by random_word_ to ASCII* 

Usage. 

del convert_word_ entry ((0**> fixed bin, (0:*) bit(l) 
aligned* fixed bin* char(*l, char(*)M 

call converts or d_ (word, hyphenated_word, word_length, 
asci l_word» asci i_hyphenated_word ) ? 

1> word Array of random units returned from a previous 

call to random_word_* (Input) 

2) hyphenated_word Array of bits indicating where hyphens are to 

be placed* returned from random_word_* (Input) 

3) word_length Number of units in word* returned from 

random_wor d_. (Input) 

*») ascii_word This string will contain the word, left justified* 
with trailing blanks* This string should be long 
enough to hold the longest word that may be 
returned* This is normally the value of "maximum"* 
supplied to random_word_* (Output) 

5) asci i_hyphenated_wor d This string will contain the word* with 
hyphens between the syllables, left Justified 
within the string* The length of this string 
should be at least 3*maximum/2+l to guarantee that 
the hyphenated word will fit* (Output) 

Entry* convert_word__$no__hyphens 

This entry can be used to obtain the ASCII form of a random 
word without the hyphenated form* 



MTB-19^ 



Honeywell Information Systems, Inc 



convert_word_ 



MPLM SYSTEM TOOLS 



Subroutine 
Page 2 
05/08/75 



del convert_word_$no_hyphens (CO**) fixed bin, fixed bin, 
char(*)>; 

call c onv er t_w or d_$no — hyphens (word, word_length» 
ascii_wor d) ; 

Arguments are the same as above* 
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Hawei convert_word_char_ 

This subroutine facilitates printing of the hyphenated 
returned from a cal I to hyphenate... 



word 



del convert_word_char_ entry (char(*), (*) bit(i) aligned* 
fixed bin, chart*) varying); 

call converts or d_char_ (word* hyphens, last, result); 



1) word 



2> hyphens 



3) last 



This string is the word to be hyphenated. (Input) 



This is the array returned from 
hyphenate., that marks characters 
which hyphens are to be inserted. 



a cal I to 
in word after 
(Input) 



k) result 



This is the status code returned from hyphenate... 
If negative, the result will be the original word, 
unhyphenated, with ** following it. If positive, 
the word will be returned hyphenated, but with an 
asterisk preceding the last'th character. If 
zero, the word will be returned hyphenated without 
any asterisks. (Input) 

This string contains the resultant hyphenated 
word. (Output) 



MTB-194 



Honeywell Information Systewst Inc 



MPLM SYSTEM TOOLS 



dl gram_table_compl I er 



Command 
05/08/75 



Naffei digram_tab I e_compil er, dtc 

This command compiles a source segment containing the 
digrams for the random word generator and produces an object 
segment with the name "di grams_". 



Usage. 



digram_tab le_compi !er pathname -option- 



1> pathname 



2) -opt 1 en- 
list, -Is 



-list n, -Is n 



is the pathname of the source segment. If 
the suffix •••dtc" does not appear* it will be 
assumed. Regardless of the name of the 
source segment, the output segment will 
always be given the name *"di grams_~ and will 
be placed in the working directory* 

may be the following: 

lists the compiled table on the terminal* 
The table will be printed in columns to fit 
the terminal line length. If file_output is 
being used. lines will be 132 characters 
long. 

lists the table as above, but uses n as the 
number of columns to print. Each column 
occupies positions, thus a value of 5 will 
cause 5 columns to be printed, each line 
being 70 characters long. This option is 
useful when flle_output is being used, so 
that the lines produced are not too long to 
fit on the terminal to be used to print the 
output file. 



Notes 

The compiler makes an attempt 
combinations of attributes, as well as 



to detect Inconsistent 
syntax errors. If an 



error is encountered during compilation, processing of the source 
segment will continue if possible. The digrams segment in case 
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of an error will be left in an undefined state. 

During compilation, the ALM assembler is used. At that 
point the letters "ALM*" will be printed on the terminal. If 
compilation nas successful, no other messages should appear. 

The listing produced by di grara_tabl e_ compi I er is in a format 
suitable for printing on the terminal — not for dprinting. This 
is because blank lines are used for page breaks, instead of the 
**new page** character as recognized by dprint. t 

Syntax 

The syntax of the source segment is specified below. Spaces 
are meaningful to this compiler and a space is only allowed where 
specified as <space>. The new line character is indicated as 
<new line>. 



<dlgram table>5S= <unit specs>*£<new I ine>l ...<digram specs>$ 
<unlt specs>tt= <unit spec>t <de I im><uni t speol... 
<digram specs>*t= <digrara spec>C<del i mxdigram speol... 
<delim>it= ,£<new line>]l<new line> 

<unlt speot i= <unit name>l<not begin word>t<no final split>31 
<digram spec>u= £<begin><not beglnxbreakxpref ix>3 

<unlt namexunlt name>C <suf f ix> £ <end>£ <no t end>333 
<unit name>tt= <l et ter> £< I etter>I 

<letter>i J= alb Id d lei f I g 1 hi i 1 J Ik 1 1 1 ml n I o Ip I qlr is! 1 1 ul v 1 wl xl y lz 

<not begin word>tt= <bit> 

<no final spl i t> * J= <bl t> 

<begin>*8= <blt> 

<not begln>ts= <bit> 

<break>**= <bit> 

<preflx>**= <space>l- 

<suffix>u= <space>l-l+ 

<end>il= <blt> 

<not end>**= <bit> 

<blt>«*= <space>H 

The first part of the <dlgram table> consists of definitions 
of the various units that are to be used and their attributes. 
The units are defined as one or two-letter pairs, and the order 
in which they are defined is unimportant. For each unit, the 
attributes <not begin word> and <no final split> may be 
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specified. In addition, if <unit name> is a, e* 1* o» or u, the 
"vowel*" attribute is set. If the unit is y» the 
"alternate vowel* attribute is set. A <bit> is assumed to be 
zero if specified as <space>* or one If specified as l. 

The second part of <digram table> specifies all possible 
pairs of units and the attributes for each pair. The order in 
which these pairs must be specified depends on the order of the 
<unit specs> as follows: 
• 

Number the <unit spec>s from l to n in the order in which 
they appeared In <unit specs>. The first <digrara speo must 
consist of the pair of units numbered (1*1)* the second 
<digra» speo is the pair <i»2>* etc.* and the last <digra» speo 
is the pair (n*n). All pairs must be specified* i.e.* there must 
be n*n <digram speos. The <bit>s preceding or following each 
pair set the attributes for that pair as shown. The <pref ix> and 
<suf f ix> indicators are set to 1 if specified as If 
<suffix> is specified as the "illegal pair" indicator will 

be set* and no other attributes may be specified for that 
<dlgram speo. 

Eyamol e 

The following is a very short example of a <d 1 gram table>. 
Only four units are defined* "a", "b"* "sh" and "e". The letter 
"e" is given the "no final split" attribute* the pair "aa" is 
given "illegal pair", the pair "ae" is given the "not begin"* 
"breaK"* and "not end" attributes* etc. 

a*b*sh*e i; 

aa+*ab*ash* 11 ae 1 

ba* 1 bb* 11 bsh I, be 

sha* 11 shb i*shsh+*she, ea*eb*esh*ee 

$ 

Assume the above segment was named "dt.dtc". Below Is an 
example of the command used to compile and list the table 
produced for dt. 

di gram_tab I e_compi I er dt -!s 
ALU 
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1 a 0010 2 b 0000 3 sh 0000 k e OliO 

000 aa 4-00 000 ba 00 000 sha 00 000 ea 00 

000 ab 00 010 bb 00 011 shb 01 000 eb 00 

000 ash 00 Oil bsh 01 000 shsh+oo 000 esh 00 

011 ae 01 000 be 00 000 she 00 000 ee 00 

The first line of output lists the individual units* The 
number preceeding the unit is the unit index* The four bits 
following the unit are respectively! 

not begin syllable 
no final split 
vo we I 

alternate vowel 



Following the unit specifications are the digram specifications* 
Preceeding each digram are three bits and a space (or possibly a 
■*-"'} with meanings corresponding to those specified in the source 
segment as follows! 



begin 
not begin 
break 

prefix (if appears) 



Immediately following each digram is a field which may be blank* 
or — If the "illegal pair- flag is set. Otherwise* 

the meaning of the and following two bits are as follows! 



suffix (if appears) 
end 

not end 
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iUffl£S* hyphen_test 

This command uses the random word generator (the same one 
used by generate_words) to divide words into syllables* Words 
are printed on the terminal with hyphens between the syllables* 



Usa.qg 

hyphen.test -control _arg- -wordi- ••• -wordn.- 



11 control_arg may be -probability C-pb), specifying that 

the probability of each of the words that 
follows be printed alongside the hyphenated 
word* 

2) wordl are one or more words to be hyphenated* A 

word may consist of three to twenty 
alphabetic characters* only the first of 
which may be uppercase* 



Notes 

The control argument may appear anywhere in the command 
line* However, it only applies to words that follow. Words 
preceding the option Mill be hyphenated but no probabilities will 
be calculated* 

If a word contains any illegal characters, or is not of 
three to twenty characters in length, the word will be printed 
unhyphenated, followed by **• 

If the word could not be completely hyphenated because it 
was considered unpronounceable, an asterisk (*) will be printed 
out in front of the first character that was not accepted* The 
part of the word before the asterisk will be properly hyphenated* 

The calculated probability is the probability that the word 
would have been generated by generate_words» assuming 
generate_words was requested to generate a word of that length 
only* If a range of lengths Is requested of generate__words, each 
length has equal probability* For example, if generate^ words Is 
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called to generate words of 6* 7* or 8 characters* there is a 33% 
probability that a given word will have 8 characters* If 
hyphen_test is then asked to calculate the probability of a given 
8 letter word* that probability should be divided by 3 to obtain 
the correct probability for the case of three possible lengths* 
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Name ! hyphenate. 

This subroutine attempts to hyphenate a word into syllables* 



del hyphenate, entry (chart*), 
bin) ; 



(*) bit(l) aligned, fixed 



call hyphenate, (word, hyphens, code); 



2) hyphens 



3) code 



1) word This is a left Justified ASCII string, 3 to 20 

characters in length* This string must contain 
all lowercase alphabetic characters, except the 
first character way be uppercase* Trailing blanks 
are not permitted in this string. (Input) 

This array will contain a "l M b for every character 
in the word that is to have a hyphen following it* 
(Output) 

This is a status code, as follows* 

0 word has been successfully hyphenated, 
-i word contains illegal (non alphabetic or 

uppercase) characters. 
-2 word was not from three to twenty characters 

in length. 

Any positive value of code means that the word 
couldn't be completely hyphenated* In this case, 
code is the position of the first character in 
word that was not acceptable. The part of the 
word before code will be properly hyphenated. 
(Output) 

Notes 

This subroutine uses random.word. to provide the 
hyphenation. It does this by calling randoro.word.Sgive.up and 
supplying its own version of random.uni t and random.vowel that 
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return specified units (of the particular word to be hyphenated) 
instead of random units* 

The word supplied to hyphenate, is first transformed into 
units by translating pairs of letters into single units if a 
2-letter unit is defined for the pair* and then by translating 
the remaining single letters into units* See the subroutine 
description of random.word. and random_unlt_ for a description of 
units* If any units of the word are rejected by random, word. * 
hyphenate, tries to determine if the refused letter was a 
2-letterr unit* If this is the case* the 2-letter unit is broken 
into two l-ietter units and random.word. Is called again* In 
rare cases* hyphenate, is not able to determine which 2-letter 
unit is at fault* and will return a status code indicating that 
the word is unpronounceable* when* in fact, it could have been 
properly divided by breaking up a 2~letter unit* 

Entry! hyphenate. $probabi I Ity 

This entry returns information as above* but also supplies 
the probability of the word having been generated at random by 
generate. word_ or random. word.generator.. The assumption is made 
that generate.word. or random.word.generator. was asked to supply 
a word of exactly the same length as the word given to 
hyphenate.* rather than a range of lengths* If a range of 
lengths was asked of generate.word.* the probability must be 
divided by the number of different lengths (all lengths are 
equally probable). 

Usage 

del hyphenate.Sprobabi lity entry <char<*)* (*) bit(l) 
aligned* fixed bin* float bin)? 

call hyphenate.Sprobabll ity (word* hyphens* code* 

probability! , 

1) to 3) are as above* 

k) probability is the probability as defined above* (Output) 
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Notes 

If the supplied word is illegal (i.e. code is not zero), the 
probability will be returned as zero* 

Entry ! hyphenate_$debug_on, hyphenate_$debug_of f 

These entries set and reset a switch that causes 
hyphenate_$probabi I ity to print * on user_output, all uni ts (see 
the subroutine descriptions of random_word__ and random_unit_ for 
a description of units} that are illegal in a given position of 
the word* This entry is useful for debugging a digram table for 
random_word_. It makes no assumptions about the information 
contained in the digram table with regards to which units are 
defined* their distributions, the order of the units, etc* 
However, it assumes that a call to random_unit_ Sprobabi I ity will 
return arrays of the size di graras__$n_units containing the 
probabilities of the units that are defined* See the subroutine 
description of randora_unit_ for a description of the 
random_uni t_$probabi I ity entry, and the subroutine description of 
rando«_word_ for a description of digrams... 

del hyphenate_$debug_on entry? 
del hyphenate_$debug_of f entry; 

call hyphenate_$debug_on$ 
call hyphenate_$debug_of f ? 

An example of the output produced is as follows. The 
assumption is that hyphenate_$probabi I ity is invoked by the 
hyphen_test command using the -probability option* 

hyphenate_$debug_on 
hyphen_test -probability fish 

x,ck,l? b,c,d,f,g,h,),k,m,n,p,s,t,v,w,x,y,z,ch,gh,ph, 
rh,sh, th, wh,qu,ck,l, I ,rh, wh,qu ,sh? 
fish 6.0**127576e-5 
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In the above example, the units x and ck are shown to have been 
illegal as the first unit of the word* and the unit It 
(underlined) Is the first unit of the word that was accepted. 
All other units that were not printed are legal as the first unit 
of the word. Following the semicolon after X are the units that 
are illegal in the second position of the word (assuming that f 
is the first unit). Then J. is shown as the legal unit that is 
taken from the word "fish**. This repeats for each position of 
the wordt ending in the legal unit sJi (note only one underline). 

If the supplied word is illegal* the last underlined letter 
in the output is (usually) the letter that was not accepted. In 
cases where hyphenate, has to split up a 2-letter unit* the word 
will be shown to start over from the beginning. 
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Nam ?! print_digram_tabl e 

This entry merely prints the digram table on the terminal, 
assuming that it has already been compiled successfully. The 
segment "digrams^" is assumed to be located in the working 
directory. 



prlnt_digram_t able -n 



1) n is the number of columns in which to print the table* 

If not specified* the maximum number of columns that 
will fit in the terminal line will be used* Each 
column occupies 1<* positions* If file_output Is being 
used* the terminal line width is assumed to be 132* 

Notes 



This entry performs the same function as the -list option of 
digram_tab I e_compl I er. 
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Name* randora_uni t_ 

This subroutine provides a random unit number for 
random_word_ based on a standard distribution of a given set of 
units* It is referenced by the genera te_word_ subroutine as an 
entry value that is passed in the call to random_word_* This 
subroutine assumes that the digram tab le being used by 
randoi8__word.. is a standard table* The digram table itself is not 
referenced by this subroutine. 

declare random_uni t_ entry Cfixed bin)? 

call random_unlt_ (unit!? 

1) unit is a number from i to 34 that corresponds to a 
particular unit as listed in Notes below* (Output) 

Notes 

The table below contains the units that are assumed 
specified in the digrams supplied to random_word_* Shown in the 
table are the unit number* the letter or letters that unit 
represents* and the probability of that unit number being 
generated* 



1 a .04739 

2 b *03792 

3 c • 05687 

4 d .05687 

5 e .05687 

6 f .03792 

7 g .03792 



8 h .02844 

9 1 .04739 

10 J .03792 

11 k .03792 

12 I .02844 

13 m .02844 

14 n .04739 



15 o .04739 

16 p .02844 

17 r .04739 

18 s .03792 

19 t .04739 

20 u .02844 

21 v .03792 



22 


w 


.03792 


23 


X 


.00474 


24 


v 


.03792 


25 


2 


•00474 


26 


ch 


•00474 


27 


gh 


.00474 


28 


ph 


•00474 



29 rh .00474 

30 sh .0091*8 

31 th .00948 

32 wh .00474 

33 qu .00474 

34 ck .00474 
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Entrv t rando«_unlt_$random_vowel 

This entry returns a voxel unit number only. 

declare rando»_unl t_$random_vowel (fixed bin)? 
call random_unlt_$randora__vowel (unit); 
1) unit As above* (Output) 

Below are listed the vowel units and their distributions* 



i 


a 


• 167 


5 


e 


.250 


9 


1 


.167 


15 


o 


.167 


20 


u 


.167 


Z<* 


y 


• 083 



Entr y* random_unit_$probabl I it ies 

This entry returns arrays containing the probabilities of 
the units as listed In the table on the previous page. This 
entry is provided for hyphenate_$probab i I Ity and any other 
program that might require this information. The probabilities 
must be computed when this entry is called* so it is suggested 
that the call be made only once per process and the values saved 
in Internal static storage. 

declare random_uni t_$probabi I it ies entry ((*) float bin, (*) 
float bin); 

call random_un it_$probab i I it ies (unit_probs* vowel_probs) ; 

1) unit_probs This array contains the probabilities of the 
individual units assuming the random__unl t__ entry 
is called to generate the random units. The value 
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of unit_probs< 1) is the probability of unit(i). 
(Output) 

21 vowel_probs This array contains the probabilities of the units 
when ran do ra_vo we I is called* Since there are only 
6 vowels, most of these values will be zero. 
(Output) 

Notes 

h future version of random_ unit_ may use different units 
with different probabilities* The size of the two arrays must be 
large enough to hold the maximum number of values that may be 
returned by random_unlt_ (which is currently 3<f)* Programs 
should not depend on the unit_index-to-letter correspondence as 
shown in the table* This information can be obtained by using 
the include file di gram_structure* incl • pi 1* 
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Name ! random..word_ 



This 
speci f ied 
cal ier to 
generate 
English-like 
used. 



routine returns a single random pronounceable word of 
length* It is called by generate__word_* and allows the 
specify the particular subroutines to be used to 
random units* For users desiring random words with an 
distribution of letters* generate_word_ should be 



Usage 



del rando»_word_ entry ((flt*) fixed* (o**) bit(i) aligned* 
fixed* fixed* entry* entry); 



call random_word_( word* hyphens* char_l ength* unit_l ength* 
random_unlt* random__vowel ) ? 



1) word The random word will be stored in this array 

starting at word(l) (word(0) will always be 0)* 
The numbers stored will correspond to a "unit 
index" as described in Notes below* This array 
must have a length at least equal to the value of 
"char_l ength"* Unused positions in this array* up 
to word(char_l ength) * will be set to zero* 
(Output) 



2) hyphens This array must be of length at least 

"char_l ength"* A bit on in a position of this 
array indicates that the corresponding unit in 
"word" (including the very last unit) is the last 
unit of a syllable* (Output) 



3) char_l ength Length of the word to be generated* in characters* 

( Input) 

<») unit_length This is the length of the generated random word in 
units* i.e.* the index of the last non-zero entry 
in the "word" array* The actual length of the 
word in equivalent characters will be the value of 
char_length* (Output) 
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5) random_unit This is the routine that will be called by 

random_word_ each time a random_unit is needed* 
The random_unit routine is declared as follows* 

del random_unit entry {fixed bin>» 

where the value returned is a unit index between 1 
and n_units. If an English- like distribution of 
letters is desired, the •'random_uni t_J" subroutine 
may be specified here. See Notes below* (Input) 

6) random_vowel 

This is the routine called by random_word_ when a 
vowel unit is required. This routine must return 
the index of a unit whose '•vowel" or 
"a I terna te_vowei " bits are on. See tyotes below. 
This routine is declared as follows* 

del random_vowel entry (fixed bin)* 

If desired* the subroutine 

"ran dom_un i t_$r an dom_ vowel " may be specified in 
this place. (Input) 

Notes 

The word array can be converted Into characters by calling 
convert_word_. 

In order to use random_word* a digram table, contained in a 
segment named "digrams^*"* must be available in the search path. 
This table can be created by the digram_tab le_compl ler. 

If the user supplies his own versions of random_unit and 
random_vowe I * these subroutines will have to supply legal units 
that are recognized by the random_word_ subroutine. The Include 
file "di gram_struc ture.incl .Pi 1" can be used to reference the 
digram table to determine which units are available. If included 
in the source program, appropriate references to the following 
variables of interest In •"digrams^" will be generated* 

del n_units fixed bin defined digrams_$n_units? 
del I etters(0*n_units) char<2) aligned 
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based ( addr (di grams_$l et tersl I * 
del 1 rules(n_ units! aligned based( addr{ d i grams_$ru I es) ) f 

2 vowel bit(l), 
2 al ternate_vowel bit(l), 



where* 



n_units 



is the number of different units* 



lettersd! 



contains 1 or z characters (left Justified! 
for the i*th unit. 



rul es. vowel ( i) • 



rul es*a lternate_vowel (i ! 

One of these two bits are set for the units 
that may be returned by a call to 
random^vowel • 



When randora_unit is called* a number from i to n_units must 
be returned* When random^ vowel is called* a number from 1 to 
n_units* where one of the two bits in rules(i) is marked* must be 
returned* 

Entry ! random_word_$debug_on 

This entry sets a switch in random_word_ that causes 
printing (on user_output) of partial words that could not be 
completed* This entry is of interest during debugging of 
random_word_ or for checking the consistency of the digram table 
prepared by the user* 



Usjge 



del random_word_$debugi_on entry; 



call random_word_$debug_ on? 



gptry : random_word_$debug_of f 



This entry resets the switch set by debug_on. 
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Additional not&s 

The random.word. subroutine can be used for certain special 
applications (such as the application used by hyphenate.)* and 
there are certain features that help support some of these 
applications* The features described below are of little 
interest to most users. 

The first feature allows the callei — supplied random.unit 
(and random.vowel > subroutine to find out whether random.word. 
"accepted" or "rejected"" the previous unit supplied by 
random.unlt. Each time random.unit is invoked by random.word.* 
the value of the argument passed is the index of the previous 
unit that random.unit. returned (or zero on the first call to 
random.unit in a given invocation of random.word.) • The sign of 
the argument will be positive if this last unit was accepted. 
"Accepted" means that the last unit was inserted into the random 
word and the word index maintained by random. word, was 
incremented. Once a unit is accepted* it is never removed. Thus 
a positive value of the unit index passed to random_uni t means 
that a unit for the next position of the word is requested. 

If the unit index passed to random.unit has a negative sign* 
the last unit was rejected according to the rules used by 
random.word. and information supplied in the digram table. If 
the unit is rejected* random.word. does not advance its word 
index and calls random.unit again for another unit for that same 
word position. Hith this information random.unit can keep track 
of the "progress" of the word being generated. 

The feature described above is used by the special 
random.unit routine provided by hyphenate.. Since the 
random.unit routine for hyphenate, is not really supplying random 
units (but Is supplying units of the word to be hyphenated)* It 
must know whether any particular unit is rejected by 
random.word.. Rejection then implies that the word Is illegal 
according to random.word. rules. 

The second feature allows random.unit to "try" a certain 
unit without committing that unit to actually be used in the 
random word. The sign of each unit supplied to random.word. by 
random.unit Is checked. If the sign of the word Is positlvet 
random.word. will accept or reject the unit according to Its 
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rules* and will indicate this on the subsequent call to 
random_uni t. 

If the sign of the unit passed to randora_word_ is negative* 

random_word_ will nerely indicate (on the subsequent call to 

random_unit) whether that unit Mould have been accepted* but it 

never actually updates the word index* In other words* 

random_word_ always rejects the unit* but lets random_unlt Know 
whether the unit was acceptable* 

This latter feature is used by hyphenate_$probabi I ity in 
order to determine which of all possible units are acceptable in 
a given position of the word* The random_unit routine used by 
hyphenate_$probabi I ity tries all possible units in each word 
position* and only allows random_word_ to accept the unit that 
actually appears In that position. 
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Hajajgs read_table_ 

This subroutine is the compiler for the digram table for 
random_nord_« It is called by digram_tab le_compi ler. 

declare read_table_ entry (ptr f fixed bln(2*t)« returns 

(bitdn; 

flag = read_table_ <source_ptr* bitcount)? 

1) source_ptr is a pointer to the source segment to be compiled. 
Clnputl 

21 bltcount is the bit count of the source segment* (Input) 

3) flag is "0"b if compilation was successful* It is M i"b 

if an error was encountered* 

Notes 

If compilation was successful* the compiled table will be 
placed in the working directory with the name "digrams^*". If 
unsuccessful 9 the digrams segment may or may not have been 
created* and may be left in an inconsistent state (i.e., unusable 
by random.. word_) • Error messages are printed out on user_output 
as the errors are encountered* except that file system errors are 
printed on error_output • 

This subroutine uses the ALM assembler for part of its work* 
As a result* the letters "ALM" will be printed on user_output 
sometime during the compilation* 
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